NSF PAR Search | NSF Public Access Repository

A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.

Bilu-Linial Stability, Certified Algorithms and the Independent Set Problem

Angelidakis, H; Awasthi, P.; Blum, A.; Chatziafratis, V.; Dan, C. (September 2019, 27th Annual European Symposium on Algorithms (ESA 2019))

We study the classic Maximum Independent Set problem under the notion of stability introduced by Bilu and Linial (2010): a weighted instance of Independent Set is γ-stable if it has a unique optimal solution that remains the unique optimal solution under multiplicative perturbations of the weights by a factor of at most γ ≥ 1. The goal then is to efficiently recover this “pronounced” optimal solution exactly. In this work, we solve stable instances of Independent Set on several classes of graphs: we improve upon previous results by solving \tilde{O}(∆/sqrt(log ∆))-stable instances on graphs of maximum degree ∆, (k − 1)-stable instances on k-colorable graphs and (1 + ε)-stable instances on planar graphs (for any fixed ε > 0), using both combinatorial techniques as well as LPs and the Sherali-Adams hierarchy. For general graphs, we give an algorithm for (εn)-stable instances, for any fixed ε > 0, and lower bounds based on the planted clique conjecture. As a by-product of our techniques, we give algorithms as well as lower bounds for stable instances of Node Multiway Cut (a generalization of Edge Multiway Cut), by exploiting its connections to Vertex Cover. Furthermore, we prove a general structural result showing that the integrality gap of convex relaxations of several maximization problems reduces dramatically on stable instances. Moreover, we initiate the study of certified algorithms for Independent Set. The notion of a γ-certified algorithm was introduced very recently by Makarychev and Makarychev (2018) and it is a class of γ-approximation algorithms that satisfy one crucial property: the solution returned is optimal for a perturbation of the original instance, where perturbations are again multiplicative up to a factor of γ ≥ 1 (hence, such algorithms not only solve γ-stable instances optimally, but also have guarantees even on unstable instances). Here, we obtain ∆-certified algorithms for Independent Set on graphs of maximum degree ∆, and (1 + ε)-certified algorithms on planar graphs. Finally, we analyze the algorithm of Berman and Fürer (1994) and prove that it is a ((∆+1)/3 + ε)-certified algorithm for Independent Set on graphs of maximum degree ∆ where all weights are equal to 1.

Full Text Available

Search for: All records